Measuring style with the authorship ratio An invariant metric of lexical similarity

نویسندگان

  • Edward J. L. Bell
  • Damon Berridge
  • Paul Rayson
چکیده

Stylometry is the study of the computational and mathematical properties of style. The aim of a stylometrist is to derive stylometrics and models based upon those metrics to quantitatively gauge stylistic propensities. This paper presents a method of formulating a stylistic distance function via a weighted ratio of lexical stylometrics, the higher the ratio the more the styles diverge. The coefficients of the distance function are estimated using Powell’s conjugate gradient method (Powell, 1964) on a 4 million word corpus of 19th Century literature. The distance metric proves accurate over 30,000 binary comparisons and rivals the discernment aptitude of established techniques (Labbé and Labbé, 2001). Previous metrics have suffered from sample-size dependencies, the metric proposed here is resilient to such bias.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Translation Invariant Approach for Measuring Similarity of Signals

In many signal processing applications, an appropriate measure to compare two signals plays a fundamental role in both implementing the algorithm and evaluating its performance. Several techniques have been introduced in literature as similarity measures. However, the existing measures are often either impractical for some applications or they have unsatisfactory results in some other applicati...

متن کامل

Translation Invariant Approach for Measuring Similarity of Signals

In many signal processing applications, an appropriate measure to compare two signals plays a fundamental role in both implementing the algorithm and evaluating its performance. Several techniques have been introduced in literature as similarity measures. However, the existing measures are often either impractical for some applications or they have unsatisfactory results in some other applicati...

متن کامل

Measuring Conceptual Distance Using WordNet: The Design of a Metric for Measuring Semantic Similarity*

This paper describes the development of a metric for measuring the semantic distance or similarity of words using the WordNet lexical database. Such a metric could be of use in development of search engines and text retrieval systems, tasks for which the richness of natural language can cause difficulty. Further, such a metric can prove invaluable to psycholinguists who wish to study lexical se...

متن کامل

Accuracy and robustness in measuring the lexical similarity of semantic role fillers for automatic semantic MT evaluation

We present larger-scale evidence overturning previous results, showing that among the many alternative phrasal lexical similarity measures based on word vectors, the Jaccard coefficient most increases the robustness of MEANT, the recently introduced, fully-automatic, state-of-the-art semantic MT evaluation metric. MEANT critically depends on phrasal lexical similarity scores in order to automat...

متن کامل

Style based Authorship Attribution on English Editorial Documents

The aim of the authorship attribution is identification of the author/s of unknown document(s). Every author has a unique style of writing pattern. The present paper identifies the unique style of an author(s) using lexical stylometric features. The lexical feature vectors of various authors are used in the supervised machine learning algorithms for predicting the unknown document. The highest ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009